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Abstract 


Untreated diabetic retinopathy, a consequence of poorly managed chronic diabetes, can lead to complete 


vision loss. Early diagnosis and treatment are crucial to prevent severe complications. Currently, 


ophthalmologists dedicate significant time to manually diagnose diabetic retinopathy, causing discomfort to 


patients. Automated technologies offer a promising solution by swiftly identifying diabetic retinopathy and 
facilitating timely treatment to mitigate further ocular damage. This study proposes leveraging machine 


learning to extract and classify key features such as exudates, hemorrhages, and microaneurysms using a 


hybrid classifier combining support vector machines, k-nearest neighbors, and random forests. 
Keywords: Diabetic Retinopathy; Machine Learning; KNN; Random Forest; SVM. 


1. Introduction 

Diabetes, a chronic condition affecting millions 
worldwide, presents a spectrum of complications, 
among which diabetic retinopathy stands out for 
its potential to cause irreversible vision 
impairment, even leading to total blindness in 
severe cases. Early symptoms such as eye 
floaters, blurred vision, darkened areas, and color 
perception difficulties serve as critical indicators 
of diabetic retinopathy's onset. Timely and 
accurate diagnosis during these early stages is 
paramount in preventing irreversible vision loss. 
This research addresses the challenge of early 
detection through automated computer-aided 
methods, specifically focusing on extracting and 
analyzing key features—hemorrhages, 
microaneurysms, and exudates—from retinal 
images. Leveraging a hybrid machine learning 
approach, combining Support Vector Machines 
(SVM) and k-Nearest Neighbors (KNN), the 
proposed model aims to enhance diagnostic 
accuracy and efficiency in identifying diabetic 
retinopathy. By integrating these advanced 
technologies, the study endeavors to improve 
patient outcomes and mitigate the devastating 
impact of diabetic retinopathy on visual health. 


2. Literature Review 

Recent advancements in deep learning have 
revolutionized various fields, particularly in the 
realm of medical image classification and analysis. 
Among these innovations, convolutional neural 
networks (CNNs) have emerged as highly effective 
tools for processing medical images due to their 
robustness and efficiency. This literature review 
examines current methodologies in the classification 
and detection of diabetic retinopathy (DR) using deep 
learning algorithms, reflecting on their efficacy and 
application in analyzing color fundus images [1]. The 
availability and analysis of comprehensive color 
fundus retina datasets for DR have also been a focal 
point, underscoring the importance of robust datasets 
in training and validating deep learning models. Early 
detection of diseases like diabetic retinopathy is 
critical in medical practice as it significantly 
enhances the effectiveness of treatment interventions. 
Diabetes, a widespread chronic condition affecting 
425 million adults globally, is characterized by 
insulin deficiency and elevated blood glucose levels. 
Its impact extends beyond metabolic disturbances, 
affecting vital organs such as the kidneys, heart, 
nerves, and notably, the retina. The retina, being 
highly sensitive to fluctuations in blood glucose 
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levels, is particularly susceptible to diabetic 
retinopathy, which, if left untreated, can lead to 
severe vision impairment and even blindness. As 
such, leveraging deep learning techniques for the 
early detection and classification of diabetic 
retinopathy holds promise in improving patient 
outcomes by enabling timely intervention and 
management strategies. This review synthesizes 
current research efforts aimed at harnessing deep 
learning's potential in advancing medical imaging 
diagnostics, with a specific focus on diabetic 
retinopathy detection and classification [2-4]. 

3. Methodologies 

Machine learning, a prominent subfield of artificial 
intelligence and computer science, focuses on 
developing algorithms that learn from data to 
improve accuracy over time, emulating human 
learning processes. This literature review explores 
the application of machine learning in a healthcare 
setting, specifically in managing and analyzing 
patient data within a comprehensive system 
comprising four modules: admin, doctor, patient, and 
lab. The admin module serves as the central authority, 
allowing administrators to manage healthcare 
professionals, facilities (such as labs and hospitals), 
and oversee patient details securely through 
authentication with a valid email address. This 
administrative oversight ensures organizational 
efficiency and regulatory compliance. Within the 
doctor module, healthcare providers access a tailored 
interface enabling them to add new patients, update 
their medical records, and manage their own 
professional profiles securely. This functionality 
enhances patient management and facilitates timely 
healthcare interventions. Patients, utilizing the 
patient module, log in securely with their credentials 
to access personalized health information and 
medical records. This patient-centric approach 
empowers individuals to actively engage in their 
healthcare journey, promoting transparency and 
informed decision-making. The lab module plays a 
crucial role in the diagnostic process by examining 
patient medical data, including retinal images, to 
predict diabetic retinopathy. Leveraging advanced 
image processing and machine learning techniques, 
labs contribute to early detection and intervention 


strategies, thereby improving patient outcomes and 
reducing healthcare costs associated with advanced 
disease stages. This integrated system exemplifies 
how machine learning technologies enhance 
healthcare delivery by streamlining administrative 
processes, facilitating data-driven medical decisions, 
and advancing diagnostic capabilities. By harnessing 
these advancements, healthcare systems can achieve 
greater efficiency, accuracy, and patient satisfaction, 
ultimately fostering a more proactive approach to 
healthcare management and disease prevention. 

3.1. Machine Learning 

Artificial intelligence (AI), specifically through 
machine learning (ML), empowers computer 
programs to predict outcomes with heightened 
accuracy, leveraging historical data as input without 
explicit programming of every possible scenario. 
Machine learning algorithms achieve this by 
analyzing patterns and relationships within data to 
generate predictions for new output values. To 
elaborate on the methodology, machine learning 
operates through several key stages. Initially, data is 
collected and preprocessed to ensure quality and 
relevance. This involves cleaning the data, handling 
missing values, and transforming features to make 
them suitable for analysis. Subsequently, the data is 
divided into training and testing sets. The training set 
is used to train the machine learning model, where the 
algorithm learns from the input data to identify 
patterns and correlations. Once trained, the model is 
evaluated using the testing set to assess its predictive 
performance. Various metrics such as accuracy, 
precision, recall, and Fl-score are employed to gauge 
the model's effectiveness in making predictions. 
Iterative refinement may occur by fine-tuning 
parameters, adjusting the model architecture, or 
employing feature selection techniques to optimize 
performance. In practical applications, AI and ML 
are deployed across diverse domains, from healthcare 
and finance to marketing and autonomous vehicles. 
Their ability to handle large volumes of data, identify 
complex patterns, and make data-driven decisions 
has revolutionized industries, leading to more 
efficient processes, improved decision-making, and 
enhanced outcomes. 


OPEN Qrccess IRJAEM 


2135 


International Research Journal on Advanced Engineering 
and Management 
https://goldncloudpublications.com 


e ISSN: 2584-2854 
Volume: 02 

Issue: 07 July 2024 
Page No: 2134-2139 


https://doi.org/10.47392/IRJAEM.2024.0312 


3.2. K Nearest Neighbour (KNN) algorithm 
Step 1: Choose the number KKK of neighbours. 
Step 2: Calculate the Euclidean distance between the 
new data point and all points in the dataset. 
Step 3: Select the top KKK closest neighbours based 
on the Euclidean distance. 
Step 4: Determine the class labels of the KKK 
nearest neighbours. 
Step 5: Assign the new data point to the class that is 
most common among its KKK nearest neighbours. 
Step 6: End of the algorithm; the model is now ready 
for predictions. 
This algorithm utilizes the Euclidean distance metric 
to find similarities between data points and assigns 
new points to the category most prevalent among 
their nearest neighbours. 
3.3. Support Vector Machine (SVM) Algorithm 
Step 1: Import the necessary libraries. 
Step 2: Pre-process the dataset: 
e Handle missing values. 
e Encode categorical variables. 
e Scale numerical features if necessary. 
Step 3: Instantiate and train the Support Vector 
Machine model: 
e Select the appropriate SVM variant based on 
the problem (e.g., linear SVM, kernel SVM). 
e Fit the SVM model to the pre-processed 
training data. 
This algorithm involves importing required libraries, 
preparing the dataset through preprocessing steps 
such as handling missing data and scaling features, 
and then training the SVM model to make predictions 
based on the input data. 
3.4. Random Forest Algorithm 
Step 1: Randomly select KKK data points from the 
training set. 
Step 2: Construct decision trees using the selected 
data points: 
e Each tree is built independently and uses a 
random subset of features for splitting nodes. 
Step 3: Specify the number NNN of decision trees to 
build. 
Step 4: Repeat Steps | and 2 to create NNN decision 
trees in the forest. 


Step 5: For a new data point, classify it by 
aggregating predictions from all decision trees: 

e Each tree gives a classification, and the final 
prediction is the majority class among all 
trees (for classification tasks). 

This algorithm leverages the power of ensemble 
learning by constructing multiple decision trees and 
aggregating their predictions to achieve robust and 
accurate classifications for new data points are shown 
in Figure 1. 


Input Image 
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Figure 1 Flow of Diabetic Retinopathy Detection 
Model 


3.5. Datasets 

Our dataset comprises a diverse collection of high- 
resolution retinal images acquired under various 
imaging conditions. Each subject is represented with 
images from both their left and right eyes, identified 
by a unique subject ID and the eye designation (e.g., 
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"1 left.jpeg" denotes the left eye of patient 1). The 
dataset encompasses a substantial volume of high- 
quality retinal images captured using different 
camera models and types, leading to variations in 
how the left and right eye images appear. The dataset 
undergoes pre-processing to prepare input images 
according to the standard requirements of the 
proposed system. This process aims to enhance 
microscopic image quality by mitigating undesired 
distortions and accentuating crucial image attributes 
necessary for subsequent analyses. Typical pre- 
processing operations involve resizing images, 
removing noise, and eliminating artifacts that could 
mislead interpretations. This step is particularly 
beneficial for accurate identification and 
categorization of red blood cells. The RGB images 
are converted to grayscale, and further enhancement 
is achieved using a median filter to refine borders, 
identify relevant components, and reduce noise 
levels. 

3.6. System Architecture 

The system begins with inputting the image dataset, 
followed by a series of preprocessing steps. Initially, 
each image undergoes resizing to standardize 
dimensions across all inputs, addressing variations in 
sizes captured by different cameras. Subsequently, 
noise reduction techniques are applied to enhance 
image clarity. The next stage involves image 
segmentation and morphology operations. Here, the 
image is segmented to distinguish foreground objects 
from the background, with additional noise reduction 
techniques employed to refine segmentation quality. 
Following segmentation, features such as exudates, 
hemorrhages, and microaneurysms are extracted 
from the images. These features serve as crucial 
indicators for subsequent classification tasks. In the 
classification step, the extracted features are utilized 
to classify the image as normal or abnormal. This 
classification leverages machine learning algorithms 
to predict the health status of the retina based on the 
extracted features. Overall, this systematic 
approach—from preprocessing and segmentation to 
feature extraction and classification—facilitates 
accurate assessment of retinal images, aiding in 
medical diagnostics and treatment decisions are 
shown in Figure 2. 


Input Dataset | Pr ) Image Segmentation 


Feature Extraction 
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Figure 2 Architecture of Proposed Model 


4. Results And Experimental Analysis 

Below Figure 3 illustrates a comparison of accuracy 
between the SVM and KNN algorithms. The X-axis 
denotes the accuracy score, while the Y-axis 
represents each algorithm. According to the results, 
the SVM algorithm achieves an accuracy score of 0.8, 
whereas the KNN algorithm achieves 0.65. These 
findings highlight the superior performance of SVM 
over KNN in this evaluation. Such insights are crucial 
for selecting the most effective algorithm for specific 
machine learning tasks. 


Accuracy Comparison 


Algorithm 


40 
‘Accuracy 


Figure 3 Accuracy Comparison Graph 


Figure 4 displays the precision score, recall score, f1- 
score, support, confusion matrix, and accuracy score 
for the KNN classifier, achieving an accuracy of 0.65. 
In contrast, Figure 5 presents these metrics for the 
SVM classifier, which achieved an accuracy score of 
0.8. The Fl score, combining accuracy and recall, 
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provides a comprehensive measure of the model's 
predictive performance. Accuracy assesses the ratio 
of correct predictions to all predictions made. 
Precision measures the proportion of correctly 
predicted positive outcomes among all predicted 
positives. Recall, on the other hand, evaluates the 
proportion of correctly predicted positive outcomes 
relative to all actual positive instances in the dataset. 


Classification report : 


precision recall fi-score support 


0.8 0.68 0.98 8.72 10 

1.8 0.88 8.48 0.53 10 
accuracy 0.65 28 
macro avg 0.78 0.65 8.63 208 
weighted avg 0.78 0.65 0.63 28 


Confusion matrix : 
[[9 1] 
[6 4]] 


Accuracy score ; @.65 


Figure 4 Evaluation of KNN 
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Figure 5 SVM Accuracy Evaluation 
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Figure 6 Confusion Matrix for SVM And KNN 


Classifiers 


Figure 6 presents the confusion matrices for both the 
SVM and KNN classifiers, detailing the true and 
predicted label values. A confusion matrix succinctly 
summarizes the expected outcomes of a classification 
task, counting accurate and inaccurate predictions per 
class. In terms of computational performance, the 
SVM algorithm consumes a total CPU time of 1.47 
seconds, with user and system times noted as 897 ms 
and 576 ms, respectively. Comparatively, the KNN 
algorithm operates with a total CPU time of 988 ms, 
comprising user and system times of 932 ms and 56 
ms, respectively. These timings provide insights into 
the computational efficiency of each algorithm in 
processing the dataset. 

Conclusion 

The proposed method successfully identifies 
hemorrhages, exudates, and microaneurysms through 
a systematic approach. Specifically, exudates are 
effectively isolated using channel extraction, 
masking, smoothing, and bitwise green AND 
operations. Meanwhile, for hemorrhages and 
microaneurysms, morphological techniques like 
opening—employing erosion and _  dilation—are 
employed. By quantifying the occurrences of these 
features in retinal images, the severity of diabetic 
retinopathy can be determined. These extracted 
features are then utilized as inputs for SVM, KNN, 
and Random Forest classifiers. The final prediction is 
derived from the combined results of these classifiers, 
categorizing the disease grade as normal or abnormal. 
Early detection and diagnosis play a pivotal role in 
preventing blindness and mitigating the progression 
of diabetic retinopathy. By leveraging these advanced 
computational techniques, this approach enhances 
diagnostic accuracy and supports timely intervention, 
thereby improving patient outcomes. In conclusion, 
the methodology of AI and machine learning 
represents a paradigm shift in computational 
capabilities, enabling systems to forecast outcomes 
accurately and efficiently based on historical data 
patterns. As these technologies continue to evolve, 
their integration into everyday applications promises 
to drive further advancements and innovations across 
various sectors. 
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